AITopics | context distillation

Collaborating Authors

context distillation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

Caccia, Lucas, Ansell, Alan, Ponti, Edoardo, Vulić, Ivan, Sordoni, Alessandro

arXiv.org Artificial IntelligenceMar-10-2025

Dynamically integrating new or rapidly evolving information after (Large) Language Model pre-training remains challenging, particularly in low-data scenarios or when dealing with private and specialized documents. In-context learning and retrieval-augmented generation (RAG) face limitations, including their high inference costs and their inability to capture global document information. In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs). KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents and can be easily plugged into models on demand. We show that next-token prediction performs poorly as the training objective for KMs. We instead propose Deep Context Distillation: we learn KMs parameters such as to simulate hidden states and logits of a teacher that takes the document in context. Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets. Finally, we highlight synergies between KMs and retrieval-augmented generation.

arxiv preprint arxiv, context distillation, information, (12 more...)

arXiv.org Artificial Intelligence

2503.08727

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

In-Context Learning Distillation for Efficient Few-Shot Fine-Tuning

Duan, Yifei, Li, Liu, Zhai, Zirui, Yao, Jinxia

arXiv.org Artificial IntelligenceDec-17-2024

Conventional solutions to few-shot learning model for the natural language inference task and employed generally fall into two categories: weights-updating knowledge distillation to internalize the context information, fine-tuning and prompt-based context learning. Each approach reducing model parameter from 1.3B to 125M and has significant limitations, particularly when scaling achieving a size reduction from 2.5GB to 0.25GB. Compared to larger models or deploying in resource-constrained to using in-context learning alone on similarly sized environments. Fine-tuning requires updating some or all models, this context distillation approach achieved a nearly model parameters, leading to high computational costs and 50% improvement in out-of-domain accuracy, demonstrating potential catastrophic forgetting.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.13243

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval

Hsu, Sheryl, Khattab, Omar, Finn, Chelsea, Sharma, Archit

arXiv.org Artificial IntelligenceOct-30-2024

The hallucinations of large language models (LLMs) are increasingly mitigated by allowing LLMs to search for information and to ground their answers in real sources. Observing that LLMs can learn to search for relevant facts by trying different queries and learning to up-weight queries that successfully produce relevant results, we introduce Learning to Retrieve by Trying (LeReT), a reinforcement learning framework that explores search queries and uses preference-based optimization to improve their quality. LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%. The simplicity and flexibility of LeReT allows it to be applied to arbitrary off-the-shelf retrievers and makes it a promising technique for improving general LLM pipelines. Despite tremendous progress, large language models (LLMs) still often hallucinate, motivating significant interest in grounding LLM answers in verified sources (Guu et al., 2020; Komeili et al., 2022; PerplexityAI, 2024; Google, 2024; OpenAI, 2024). Unfortunately, simply retrieving semantically similar documents to the user question, as is prevalent in retrieval-augmented generation (RAG; Lewis et al. 2020) pipelines, tends to fail for complex information needs not answered directly by any individual document. To tackle this, multi-hop retrieval pipelines gather information incrementally over multiple steps of search. For example, if a user asks What is a good dinner place driving from the Bay Area to Lake Tahoe on Friday night to avoid traffic?, a grounded system might need to learn about towns en route Lake Tahoe from the Bay Area, followed by traffic forecast on I-80 and finally, restaurants in Auburn (and other towns). However, doing this successfully is hard as off-the-shelf LLM performance is often unsatisfactory, and producing supervision for the best search queries to generate in a sequence of "hops" is nontrivial and expensive. Recent work tackles this via prompt optimization and rejection fine-tuning given a downstream signal.

dataset, leret, retrieval, (15 more...)

arXiv.org Artificial Intelligence

2410.23214

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Efficient LLM Context Distillation

Upadhayayaya, Rajesh, Smith, Zachary, Kottmyer, Chritopher, Osti, Manish Raj

arXiv.org Artificial IntelligenceSep-3-2024

Given Large Language Models (LLMs) demonstrate proficiency this constrained context window, context distillation (CD) across diverse tasks but often require targeted adaptations extends accessible task-specific examples by internalizing for specific applications. Various methods have them, greatly increasing the number of available examples been proposed to facilitate this adaptation, including fewshot outside of the query prompt [1]. This not only limits the fine-tuning, in-context learning, and context distillation.

context distillation, dataset, student model, (12 more...)

arXiv.org Artificial Intelligence

2409.0193

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.66)

Industry: Education (0.72)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Add feedback

Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting

Srinivasan, Krishna Prasad Varadarajan, Gumpena, Prasanth, Yattapu, Madhusudhana, Brahmbhatt, Vishal H.

arXiv.org Artificial IntelligenceMay-21-2024

In the domain of large language models (LLMs), arXiv:2305.16938 showed that few-shot full-model fine-tuning -- namely Vanilla Fine Tuning (FT) and Pattern-Based Fine Tuning (PBFT) --, and In-Context Learning (ICL) generalize similarly on Out-Of-Domain (OOD) datasets, but vary in terms of task adaptation. However, they both pose challenges, especially in term of memory requirements. In this paper, we further try to push the understanding of different fine-tuning strategies for LLM and aim to bring a myriad of these on the same pedestal for an elaborate comparison with full-model fine-tuning on two diverse datasets. To that end, we conducted a series of experiments, beginning with state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI. We then investigate adaptive fine-tuning and the efficiency of LoRA adapters in a few-shot setting. Finally, we also compare an alternative approach that has gained recent popularity -- context distillation -- with the vanilla FT and PBFT with and without few-shot setup. Our findings suggest that these alternative strategies that we explored can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT. PBFT under-performs Vanilla FT on out-of-domain (OOD) data, emphasizing the need for effective prompts. Further, our adaptive-fine tuning and LoRA experiments perform comparable or slightly worse than the standard fine-tunings as anticipated, since standard fine-tunings involve tuning the entire model. Finally, our context distillation experiments out-perform the standard fine-tuning methods. These findings underscore that eventually the choice of an appropriate fine-tuning method depends on the available resources (memory, compute, data) and task adaptability.

accuracy, comparative analysis, different efficient fine tuning method, (12 more...)

arXiv.org Artificial Intelligence

2405.13181

Country:

North America > United States > New York (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Learning by Distilling Context

Snell, Charlie, Klein, Dan, Zhong, Ruiqi

arXiv.org Artificial IntelligenceSep-29-2022

Language models significantly benefit from context tokens, such as prompts or scratchpads. They perform better when prompted with informative instructions, and they acquire new reasoning capabilities by generating a scratch-pad before predicting the final answers. However, they do not \textit{internalize} these performance gains, which disappear when the context tokens are gone. Our work proposes to apply context distillation so that a language model can improve itself by internalizing these gains. Concretely, given a synthetic unlabeled input for the target task, we condition the model on ``[instructions] + [task-input]'' to predict ``[scratch-pad] + [final answer]''; then we fine-tune the same model to predict its own ``[final answer]'' conditioned on the ``[task-input]'', without seeing the ``[instructions]'' or using the ``[scratch-pad]''. We show that context distillation is a general method to train language models, and it can effectively internalize 3 types of training signals. First, it can internalize abstract task instructions and explanations, so we can iteratively update the model parameters with new instructions and overwrite old ones. Second, it can internalize step-by-step reasoning for complex tasks (e.g., 8-digit addition), and such a newly acquired capability proves to be useful for other downstream tasks. Finally, it can internalize concrete training examples, and it outperforms directly learning with gradient descent by 9\% on the SPIDER Text-to-SQL dataset; furthermore, combining context distillation operations can internalize more training examples than the context window size allows.

distillation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2209.15189

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(3 more...)

Genre: Research Report (0.65)

Industry:

Media > Film (0.47)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback